Computational and Structural Biotechnology Journal
● American Association for the Advancement of Science (AAAS)
Preprints posted in the last 30 days, ranked by how well they match Computational and Structural Biotechnology Journal's content profile, based on 216 papers previously published here. The average preprint has a 0.30% match score for this journal, so anything above that is already an above-average fit.
Matsingos, C.; Lot, I.; Vaz, M.; Mailliart, J.; Boulayat, M.; Debacker, C.; Goupil-Lamy, A.; Gasnier, B.; Acher, F. C.; Anne, C.
Show abstract
Salla disease is caused by a genetic mutation in sialin, a lysosomal membrane transporter, which exports sialic acid from lysosomes. Substrate translocation occurs via a rocker-switch mechanism that alternately exposes the substrate-binding site to the lysosomal lumen and the cytosol. The pathogenic mutation R39C found in most Salla disease patients decreases the lysosomal localisation and the transport activity. In this study, we used computational and mutagenesis approaches to elucidate the molecular effects of the R39C mutation. Using three-dimensional models of human sialin in the lumen-open (LO) and cytosol-open (CO) states combined with the mutagenesis of selected residues, we identify a critical "triplet" motif comprising R39, E194, and E262, which is associated with an ionic lock formed between K197 and D350 in the LO conformation. Molecular dynamics simulations suggest that the electrostatic triplet negatively modulates the ionic lock, and are consistent with a strengthened ionic lock in R39C sialin, potentially favouring the LO state. To assess the global effects of the R39C mutation, we computed dynamic cross-correlation matrices and identified correlation patterns consistent with an allosteric coupling between the ionic lock K197/D350 and the region surrounding the sialic acid binding site in wild-type sialin, whereas in the LO state of R39C sialin, this communication preferentially bypasses this region. Therefore, the R39C mutation may impede the LO to CO conformational transition required for sialic acid transport, providing a plausible mechanistic framework for the decreased transport activity, and possibly the decreased lysosomal localisation, observed in Salla disease. HighlightsO_LIThe R39 residue participates in an interaction triplet, which negatively regulates an ionic lock stabilising the lumen-open conformation C_LIO_LIThe R39C mutation is associated with a stronger ionic lock in the simulations, and may favour the lumen-open state C_LIO_LICorrelation network analysis suggests an allosteric coupling between the ionic lock and the region surrounding the sialic acid binding site C_LIO_LIThe R39C mutation alters the inferred allosteric coupling between the ionic lock and the region surrounding the sialic acid binding site C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/719580v1_ufig1.gif" ALT="Figure 1"> View larger version (37K): org.highwire.dtl.DTLVardef@1ed0f72org.highwire.dtl.DTLVardef@913798org.highwire.dtl.DTLVardef@1d8e5adorg.highwire.dtl.DTLVardef@cf0060_HPS_FORMAT_FIGEXP M_FIG C_FIG
Raval, M.; Zhou, Y.; Lynch, M.; Krizanc, D.; Thayer, K.; Weir, M. P.
Show abstract
Protein translation is a highly regulated process influenced by multiple factors at the initiation, elongation, and termination stages. One notable regulatory element of the ribosome is the CAR interaction surface, a three-residue motif in the structure of the ribosome composed of C1274 and A1427 of S. cerevisiae 18S rRNA (corresponding to C1054 and A1196 in E. coli 16S rRNA) and R146 of ribosomal protein Rps3. CAR is highly conserved and positioned adjacent to the amino-acyl (A site) decoding center. It establishes hydrogen bonds with the +1 codon next in line to enter the ribosome A site, acting as an extension of the tRNA anticodon and forming base-stacking interactions with nucleotide 34 of the tRNA. However, despite CARs enzymatically strategic positioning within the ribosome, its functional relationship with the A site remains poorly characterized. Using molecular dynamics (MD) simulations, we examined the interplay between the A site and CAR site, revealing sequence-dependent modulation of H-bonding and {pi}-stacking interactions within and between the two sites. These findings highlight the interplay between the A site and CAR site, suggesting a structural and functional connection between these two regions of the ribosome that may contribute to mRNA sequence-specific tuning of translation elongation. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=91 SRC="FIGDIR/small/714784v1_ufig1.gif" ALT="Figure 1"> View larger version (22K): org.highwire.dtl.DTLVardef@1d783d3org.highwire.dtl.DTLVardef@f9cd8org.highwire.dtl.DTLVardef@102667corg.highwire.dtl.DTLVardef@967c56_HPS_FORMAT_FIGEXP M_FIG C_FIG
LEON FOUN LIN, R.; Bellaiche, A.; Diharce, J.; Etchebest, C.
Show abstract
Like other proteins, monoclonal antibodies - important biodrugs- are subject to post translational modifications, especially the N-glycosylations. However, the effect of the N-glycosylations remains poorly studied and atomistic details about their influence are rarely available. Moreover, the few existing studies focus on the prevalent immunoglobulin G1. To go further in the understanding of the impact of glycosylations, we have carried out a comparative exploration of the effect of N-glycosylations on two different classes of antibodies, namely Mab231, an IgG2 and the pembrolizumab, an IgG4. The two antibodies differ by their sequences, their length, their 3D structure but also by the location and composition of the glycans. In the present work, detailed and important information were gained through molecular dynamics simulations where both monoclonal antibodies were studied without and with the presence of their glycans. The results of 1.5 {micro}s of sampling for each system show that glycosylation does not drastically alter the overall conformational landscape of either antibody, whatever the metrics considered. However, it measurably modulates local flexibility, inter-domain correlated motions, and the relative orientation of the Fab arms with respect to the Fc domain, with statistically significant shifts in key geometric descriptors. Importantly, contact analysis reveals that glycan interactions extend beyond the Fc region to reach Fab residues. The allosteric network calculations demonstrate that the influence of Fc-bound glycans propagates even until the Fab framework regions in both mAbs, which could impact the antigen binding. The nature and magnitude of these effects are subclass-dependent, reflecting differences in glycan composition, hinge architecture, and three-dimensional organization Our findings challenge the prevailing view that Fc glycosylation uniformly promotes CH2 domain opening. More importantly, it underscores the necessity of considering full-length structures and IgG subclass diversity in glyco-engineering strategies.
Goryanin, I.; Checkley, S.; Demin, O.; Goryanin, I.
Show abstract
AbstractsO_ST_ABSBackgroundC_ST_ABSQuantitative systems pharmacology (QSP) models provide mechanistic insight into drug response but are limited by labor-intensive, expert-driven workflows. We developed an AI-assisted QSP (AI-QSP) framework that integrates large language models (LLMs) with SBML-based modeling to enable automated reconstruction, extension, and calibration of mechanistic models. MethodsThe framework was applied to a published CAR-T QSP model. The model was reconstructed in SBML and extended via LLM-guided prompts to incorporate key resistance mechanisms: T-cell exhaustion, PD-1/PD-L1 checkpoint regulation, and tumor antigen escape. Model development followed an iterative expert-in-the-loop workflow. The resulting model (21 reactions, 9 species) was calibrated to synthetic benchmark data using 19-parameter optimization. Model credibility was assessed using ASME V&V 40 and ICH M15 principles, including global sensitivity and profile-likelihood analyses. ResultsThe calibrated model reproduced benchmark dynamics with high accuracy (mean log-RMSE = 0.132). Sensitivity analysis identified CAR-T killing and bystander cytotoxicity as dominant drivers of tumor response. Profile-likelihood analysis showed 71% of parameters were practically identifiable, with remaining parameters prioritised for future data-driven refinement. ConclusionsAI-assisted QSP modeling enables reproducible, scalable model reconstruction and evolution while maintaining mechanistic transparency and regulatory alignment. This framework provides a foundation for accelerating model-informed drug development in cell and gene therapies.
Messa, P. E.; Warren, C. L.; Nicol, N. R.; Pearson, K. S.; Peters, J. P.; Fowler, A. M.; Alarid, E. T.; Ozers, M. S.
Show abstract
Grainyhead-like 2 (GRHL2) is an epithelial transcription factor with context-dependent regulatory roles, yet the sequence rules governing its DNA recognition remain incompletely defined. In this study, a high-density genomic Specificity and Affinity for Protein (SNAP) DNA-binding array containing 772,732 tiled probes derived from GRHL2 ChIP-seq regions was used to resolve GRHL2 binding specificity at 6 base pair resolution across genomic sequences. From high-affinity probes, de novo motif analysis recovered the canonical 5-AACCGGTT-3 motif. Sequence specificity landscapes revealed a stepwise reduction in binding as mismatches were introduced, with the strongest effects at the C (position 3) and G (position 6) within the motif, greater tolerance at the central CG dinucleotide, and intermediate tolerance at the A/T bases at the motif edges. This analysis also demonstrated the influence of nearby flanking sequences. Extended motif and spacing analyses indicated dimeric binding at paired motifs, with periodic helical spacing consistent with interactions on the same face of the DNA helix. Integration of SNAP array binding with ChIP-seq data distinguished direct, motif-encoded GRHL2 occupancy from indirect, cofactor-mediated recruitment at genomic sites. These results define the sequence specificity of GRHL2 interactions with variations in the DNA consensus motif and flanking sequences within an endogenous genomic context. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=77 SRC="FIGDIR/small/719077v1_ufig1.gif" ALT="Figure 1"> View larger version (21K): org.highwire.dtl.DTLVardef@1a28904org.highwire.dtl.DTLVardef@1d197aforg.highwire.dtl.DTLVardef@13d9e97org.highwire.dtl.DTLVardef@76d55f_HPS_FORMAT_FIGEXP M_FIG C_FIG
Subramanian, N.; Kumar, S. P.; Rengaswamy, R.; Bhatt, N. P.; Narayanan, M.
Show abstract
Predicting cellular behaviors, a central task in systems biology and metabolic engineering, can be enhanced through integrative modeling of processes such as gene regulation and metabolism. Information flow from gene regulation (modeled via a gene regulatory network) to metabolism (modeled via a genome-scale metabolic model) is well-studied, but the reciprocal regulation of genes by metabolites is less explored. We introduce CausalFlux, a method that models bidirectional feedback between genes and metabolites, in order to predict steady-state reaction fluxes under wild-type (WT) or perturbed (e.g., gene knockout/KO) conditions. CausalFlux does so by iteratively performing causal surgery on a Bayesian gene regulatory network and constraint-based analysis of a coupled metabolic model. CausalFlux enabled us to assess the impact of two-way feedback in several testbed models and real-world biological systems by comparing its predictions to those of TRIMER, a state-of-the-art model of gene-to-metabolite one-way feedback. Incorporating bidirectional feedback, as in CausalFlux, improved the Spearman correlation between actual and predicted fluxes in 92% of the 39 distinct simulation conditions relative to TRIMER. For predicting growth/no-growth phenotype following single-gene KOs in E. coli, CausalFlux achieved a balanced accuracy of 0.79 in identifying essential genes, and TRIMER achieved 0.71 for the same task, again highlighting the importance of modeling two-way feedback. In ablation studies that further dissect the role of specific metabolite[->]gene feedback edges in E. coli, the F1 scores of gene essentiality predictions decreased by 7.5% and 13% upon ablation of feedback edges from any metabolite to the crp gene and the 10 metabolic feedback genes with the highest influence on the KO genes, respectively. Finally, we highlight the application of CausalFlux to predict the essentiality of several hundred genes under different media conditions. Overall, our findings show that CausalFlux can crucially utilize information on feedback metabolites to predict trends in reaction fluxes and qualitative (growth/no-growth) outcomes; thereby encouraging future systems modeling efforts to carefully incorporate not only gene-to-metabolite but also metabolite-to-gene interactions. AvailabilityCode pertaining to the CausalFlux method, and its benchmarking and application is publicly available at: https://github.com/BIRDSgroup/CausalFlux. Author summaryThe myriad processes within a living cell, such as gene regulation or metabolism, are tightly interconnected. Modeling these interconnected processes can offer a deeper mechanistic understanding of cellular behaviors, as well as guide efforts that engineer the metabolic output of a cell. In this work, we develop a novel integrated model of gene regulation and metabolism that incorporates bidirectional feedback between these two processes, via the concept of metabolite-induced causal surgery on a gene regulatory network and gene-induced constraints on the fluxes of metabolic reactions. Our model, which we call CausalFlux, represents an advance over most existing models that capture just the one-way gene-to-metabolism feedback (i.e., genes coding for enzymes that control metabolic reactions). Our CausalFlux methodology opens up an unique opportunity to quantify the impact of two-way feedback in gene-metabolite systems, via comparison of CausalFluxs predictions to those of TRIMER, a published model incorporating one-way feedback alone. For predicting reaction fluxes in testbed models and essential genes in E. coli, quantitative comparison of the performance of CausalFlux vs. TRIMER showed that accounting for two-way feedback leads to more accurate and biologically meaningful predictions. CausalFlux also enabled us to quantify the effect of two-way feedback by comparing prediction performance before and after ablation of certain feedback edges from metabolites to genes. Overall, our findings highlight the importance of modeling gene regulation and metabolism as two-way interconnected systems within a living cell, and encourage future works to incorporate gene{leftrightarrow}metabolite feedback into their analyses.
Melo, R.; Viegas, T.
Show abstract
Single-chain variable fragments (scFvs) are widely used in diagnostic and therapeutic applications. These antibody fragments comprise two antibody variable domains connected by a flexible peptide linker whose properties critically influence folding, stability, oligomeric state, and antigen-binding. Therefore, careful linker selection represents a key step in scFv design. Guanylyl Cyclase C (GUCY2C) is a tumor-associated cell surface receptor expressed in gastrointestinal malignancies, including more than 90% of colorectal cancer (CRC) cases across all disease stages. Its restricted physiological expression pattern makes GUCY2C an attractive target for immunotherapy and precision oncology therapies. Here, we investigated the structural and functional consequences of incorporating alternative linker designs into an anti-GUCY2C scFv. Using molecular modeling, protein-protein docking, and molecular dynamics (MD) simulations, we evaluated the conformational stability, interdomain organization, and antigen-binding interactions of each construct. Our results provide a dynamic, structure-based assessment of how linker composition influences GUCY2C recognition and scFv structural behavior. Furthermore, this work establishes a computational framework for the rational optimization of GUCY2C-targeted antibody fragments.
Roule, T.; Akizu, N.
Show abstract
BackgroundDespite their use, quantitative comparison of epigenomic datasets such as ChIP-seq and CUT&RUN remains challenging, particularly due to difficulties in signal normalization across samples and conditions. Normalization solely based on sequencing depth is often insufficient due to the high variability in signal-to-noise ratios across samples, even from a same experiment. While exogeneous spike-in normalization can address some issues, robust spike-in controls are not always available, and may introduce additional experimental burden and computational complexity. Furthermore, normalization and differential binding analysis are typically performed using separate bioinformatics tools. Indeed, most differential analysis frameworks operate on raw count matrices, preventing users from visually inspecting normalized signal tracks and evaluating how normalization influences the results. To overcome these challenges, we developed GNOMES (Genome-wide NOrmalization of Mapped Epigenomic Signals), a framework that integrates signal normalization, quality control, and differential binding analysis within a unified workflow. ResultsGNOMES is a user-friendly tool able to process ChIP-seq and CUT&RUN datasets from aligned reads, and generate normalized coverage profiles and differential binding results. The tool implements a robust genome-wide normalization strategy based on percentile scaling of signal local maxima, enabling stable normalization between biological replicates and conditions. GNOMES supports both single- and paired- end sequencing, does not required a negative control (input or IGG), and can be applied to both broad (histone marks) or narrow (transcription factor) enrichment patterns. The workflow includes normalization, optional consensus peak identification, and differential binding analysis. For each step, GNOMES generates extensive quality-control metrics and visual outputs, including normalized bigWig tracks, median signal tracks, BED files of regions with significant changes, and diagnostic plots such as heatmaps and PCA. GNOMES is highly configurable and integrates established tools such as MACS2 for candidate peak regions identification for differential binding analysis, as well as DESeq2 and edgeR for statistical testing. Finally, GNOMES is organism-agnostic and can be applied to epigenomic datasets from any model system. ConclusionsGNOMES provides an integrated and highly customizable environment for normalization and differential binding analysis of epigenomic sequencing data. By integrating signal normalization, with downstream differential statistical method for differential binding analysis, and comprehensive quality control, GNOMES simplifies the analysis of ChIP-seq and CUT&RUN datasets, for the identification of chromatin changes.
Hamid, A.; Akasha, N.; Mukumbi, P. K.; Mirghani, A.; Omer, T.
Show abstract
This article presents the development of an advanced modeling and simulation platform for carbon capture systems, with a focus on integrated process analysis from upstream CO2 capture through to bioethanol production. The platform supports the evaluation of CO2 mitigation technology by coupling mathematical bioprocess models with an interactive desktop application. The biological system employs Chlorella vulgaris microalgae to fix CO2 through photosynthesis and generate carbohydrate substrates, which are subsequently converted to bioethanol by Saccharomyces cerevisiae yeast via fermentation. The simulation integrates three established kinetic models--the Monod, Logistic, and Luedeking-Piret models--to predict biomass growth, substrate consumption, and ethanol yield under varying operational conditions. A closed-loop CO2 recycling subsystem captures fermentation off-gases and reintroduces them into the bioreactor, enhancing overall carbon utilization efficiency. Three representative simulation scenarios demonstrated process efficiencies ranging from 1.09% to 93.78% of the theoretical maximum CO2-to-ethanol conversion efficiency, confirming the platforms capacity to evaluate a wide operational envelope. The Electron/React-based desktop application provides real-time visualization, interactive 3D bioreactor models, and a simulation history module, making it accessible to researchers, engineers, and students. The platform serves as a digital twin that bridges rigorous bioprocess mathematics with intuitive user interaction, providing a cost-effective tool for designing and optimizing sustainable carbon capture and biofuel production systems.
Kenavdekar, M. V.; Natarajan, E.
Show abstract
The human gut microbiome plays a critical role in host health, yet its functional organization in disease remains poorly understood. Most studies focus on taxonomic composition or pathway abundance, which fail to capture higher-order interactions governing system-level behavior. Here, we investigated microbiome functional organization in inflammatory bowel disease (IBD), including Crohns disease (CD), ulcerative colitis (UC), and healthy controls (HC), using a network-based framework across 60 metagenomic samples. Functional pathway profiles were used to construct correlation-based interaction networks, followed by analysis of network topology, functional redundancy, keystone pathway architecture, and system robustness. Disease-associated networks (CD and UC) exhibited reduced global connectivity, increased modular fragmentation, and centralization of keystone pathways, indicating a shift from distributed organization to more fragmented and fragile network structures compared to healthy controls. Notably, machine learning models demonstrated that network-derived features achieved higher classification performance (accuracy up to 0.824) compared to redundancy-based measures. These findings reveal that microbiome dysfunction in IBD is driven by large-scale reorganization of functional interaction networks rather than loss of functional capacity. This study highlights the importance of network-level analysis in understanding microbiome-associated disease and provides a systems-level framework for future research.
Secker, C.; Secker, P.; Yergoez, F.; Celik, M. O.; Chewle, S.; Phuong Nga Le, M.; Masoud, M.; Christgau, S.; Weber, M.; Gorgulla, C.; Nigam, A.; Pollice, R.; Schuette, C.; Fackeldey, K.
Show abstract
The identification of suitable lead molecules in the vast chemical space is a critical and challenging task in drug discovery campaigns. Recently, it has been demonstrated that large-scale virtual screening provides a powerful approach to accelerate the identification of novel drug candidates by screening ever increasing virtual ligand libraries, which have reached magnitudes of > 1020 compounds. However, this desirable increase in potentially bioactive molecules poses a new challenge as enumerating and virtually screening such huge compound libraries is computationally prohibitive. Consequently, advanced approaches to navigate ultra-large chemical spaces and to identify suitable candidate molecules therein are urgently needed. Here, we present an evolutionary algorithm framework using molecular generative AI, reaction-based substructure searching, and iterative model fine-tuning for a targeted and efficient exploration of chemical fragment spaces. Combining this approach with large-scale virtual screening we are able to identify target-specific candidate molecules within the commercially available Enamine REAL Space ([~]1015). We demonstrate the applicability of the approach by successfully identifying and biochemically validating pH-specific ligands of the {micro}-opioid receptor. Our results demonstrate that integrating generative AI with evolutionary algorithms provides a promising route to explore ultra-large chemical spaces for the discovery of novel, synthetically accessible lead molecules.
Matsuda, K.; Moriya, Y.; Xu, L.; Ohmagari, R.; Aramaki, S.; Zhang, C.; Baba, A.; Hirayama, S.; Kahyo, T.; Setou, M.
Show abstract
Ubiquitin-like protein 3 (UBL3) is a post-translational modifier that sorts proteins into small extracellular vesicles and regulates the trafficking of disease-associated proteins such as -synuclein. The structural and dynamic features of the UBL domain that underlie these functions, however, remain poorly understood. Here we performed in silico structural dynamics analysis of the UBL3 UBL domain using an NMR structure ensemble combined with anisotropic network modeling (ANM) and perturbation response scanning (PRS). Principal component analysis and residue-wise fluctuation analysis consistently revealed high flexibility in the C-terminal region of UBL3. Comparative ANM analysis across 20 ubiquitin-like proteins (UBLs) further showed that C-terminal flexibility is a conserved yet variable property within the UBL family. PRS analysis demonstrated that residues forming the central -helix of the {beta}-grasp fold exert greater dynamic control over collective motions than {beta}-sheet residues. Notably, UBL3 displayed the highest helix/sheet PRS effectiveness ratio among all UBLs analyzed, highlighting the prominent dynamic contribution of helix residues in this domain. Together, these results provide a structural basis for understanding UBL3-dependent protein interactions and disease-related mechanisms, and suggest that helix-centered dynamic control in the UBL domain may represent a potential target for modulating UBL3 function.
Garcia-Ruano, D.; Georges, M.; Mohanty, S. K.; Baaziz, R.; Makova, K. D.; Nikolski, M.; Chalopin, D.
Show abstract
BackgroundLong non-coding RNAs (lncRNAs) have gained significant attention in recent years, yet distinguishing them from protein-coding transcripts remains challenging. Indeed, many lncRNAs share mRNA-like processing and existing sequence-derived signals do not fully capture the coding/non-coding boundary. Recent GENCODE annotation efforts revealed tens of thousands of novel lncRNA sequences as well as the reclassification of some lncRNAs into the protein-coding class, highlighting the need to better characterize transcript features associated with classification uncertainty and errors. ResultsWe performed uncertainty-aware benchmarking by retraining and evaluating eight transcript classifiers under a controlled protocol on a label-stable GENCODE v46-v47 subset. Beyond conventional model evaluation metrics, we quantified inter-tool agreement and entropy-based uncertainty to stratify transcripts into consensus, discordant, and consensus-error groups. To expand standard sequence and ORF-derived signals, we incorporated repeat-derived features from mature transcripts and non-B DNA motif features across gene bodies. Although aggregate performance was high, [~]45% of transcripts showed inter-tool discordance, particularly among lncRNAs. Feature analyses linked low-uncertainty predictions to strong coding-like signals, whereas high-uncertainty profiles exhibited mixed signatures. Alongside classical predictors in global importance analyses, repeat-derived features appear as main contributors. ConclusionsBy combining controlled benchmarking with transcript-level agreement and uncertainty stratification, together with extended feature profiling, we identified patterns associated with classifier disagreement and misclassification. This novel framework provides practical guidance for interpreting predictions, motivating the development of more robust coding/non-coding classifiers, while also shedding light on the sequence properties that distinguish lncRNA sequences.
Katsman, E.; Isaac, S.; Darwish, A.; Maoz, M.; Inbar, M.; Marouani, M.; Unterman, I.; Gugenheim, A.; Salaymeh, N.; Abu Khdeir, S.; Uziely, B.; Peretz, T.; Kaduri, L.; Hubert, A.; Cohen, J. E.; Salah, A.; Temper, M.; Sela, T.; Grinshpun, A.; Zick, A.; Berman, B. P.; Eden, A.
Show abstract
Liquid biopsy using ultra-low-pass whole-genome sequencing (ULP-WGS, [~]0.25x coverage) is a promising tool to detect circulating tumor DNA (ctDNA) for cancer management, and the use of the native Oxford Nanopore (ONT) sequencing platform adds DNA methylation to the set of detectable features. Here, we test the performance of methylation-based cell-type deconvolution in ULP-WGS samples from diverse epithelial malignancies and investigate several new computational strategies using our CelFiE-ISH deconvolution framework. We find that incorporating larger numbers of markers restricted to the epithelial cell lineage can reduce the cancer fraction limit of detection down to 1.7-3.1%, matching or exceeding the 3% floor of established copy-number alteration (CNA) benchmarks. Our study provides a useful strategy for analysis of ULP-WGS ONT data and indicates that marker selection remains a key challenge for analyzing methylation-based cancer datasets.
Chowdhury, T. D.; Shafoyat, M. U.; Hemel, N. H.; Nizam, D.; Sajib, J. H.; Toha, T. I.; Nyeem, T. A.; Farzana, M.; Haque, S. R.; Hasan, M.; Siddiquee, K. N. e. A.; Mannoor, K.
Show abstract
Alzheimers disease remains a major therapeutic challenge, and no {beta}-secretase (BACE1) inhibitor has achieved clinical approval. A key limitation of prior discovery efforts is reliance on single-parameter optimization, often resulting in candidates with limited translational potential. In this study, we developed a biology-informed computational framework integrating meta-ensemble QSAR modeling, molecular docking, Protein Language Model (ESM-1b)-guided residue interaction weighting, and ADMET profiling within a normalized multi-parameter ranking scheme. Model performance was validated using cross-validation, external validation, and Y-randomization (n = 100; p = 0.009), while applicability domain analysis based on Tanimoto similarity highlighted reduced reliability for extrapolative predictions. Sensitivity analysis showed high ranking stability under moderate perturbations (Spearman {rho} = 0.998 for {+/-}10%; 0.963 for {+/-}25%), with reduced agreement under randomized weighting ({rho} = 0.821), indicating that prioritization is robust but influenced by weight selection. Screening of 16,196 compounds identified 153 predicted actives (accuracy = 0.852; ROC-AUC = 0.920), which were refined to 111 candidates and seven prioritized leads. Molecular dynamics simulations (200 ns) indicated stable binding and persistent catalytic interactions, with Mol-2 showing favorable dynamic stability and ADMET characteristics. Overall, this study presents an interpretable and quantitatively evaluated framework for multi-parameter compound prioritization, supporting more reliable virtual screening in early-stage CNS drug discovery.
Tartaglia, J.; Giorgioni, M.; Cattivelli, L.; Faccioli, P.
Show abstract
BackgroundAdvances in high-throughput DNA sequencing technologies have dramatically reduced the time and cost required to generate genomic data. As sequencing is no longer a limiting factor, increasing attention must be paid to optimizing the analyses of the large-scale datasets produced. Efficient processing of such data is essential to reduce computational time and operational costs. In this context, workflow management systems (WMSs) have become key instruments for orchestrating complex bioinformatic pipelines. Among these systems, Nextflow has emerged as one of the most widely adopted solutions in bioinformatics. MethodsTo improve scalability and computational efficiency, we employed Nextflow to re-design an already existing pipeline dedicated to the analysis of MNase-defined cistrome-Occupancy (MOA-seq) data. The re-engineering process focused on modularizing the workflow and integrating containerization technologies to ensure reproducibility and easier deployment across heterogeneous computing environments. ResultsThe resulting workflow, named MOAflow, represents a modernized and fully containerized pipeline for MOA-seq data analysis. With only Docker and Nextflow required, the pipeline guarantees high portability and reproducibility. The data of the original article was used to benchmark the new pipeline. Its outputs closely match those of the original study with minor variations. ConclusionsMOAflow demonstrates how the adoption of robust WMS can substantially enhance the performance and usability of pre-existing bioinformatic pipelines. By leveraging containerization and Nextflow, it ensures consistent results across platforms while minimizing setup complexity. This work highlights the value of modern WMS-driven approaches in meeting the computational demands.
Joubert, P. M.; Sanabria, M.; Poetsch, A. R.
Show abstract
Genome stability is shaped by DNA sequence and chromatin context, but their relative contributions to double-strand break (DSB) sensitivity remain unclear. We show that the DNA language model, GROVER, can infer DSB location based on sequence. DSB hotspots tend to contain GC-rich sequences that belong to promoters, genes and short interspersed nuclear elements (SINEs). Additionally, we identified several specific short sequences (tokens) that are associated with modulating DSB sensitivity. Another model using chromatin and genome regulatory features outperforms the sequence-only model, highlighting complementary and cell-type specific information. Integrating sequence and genome biological features yields the best performance, demonstrating their synergy. Analyzing this model revealed that, dependent on the sample, genome stability information encoded in H3K36me3 and DNase-seq can be learned from the sequence, but not H3K27ac or H3K9me3. Embedding chromatin data directly into the GROVER architecture enabled cell-type specific modeling with performance matching the full chromatin feature model. Our results suggest that while chromatin and regulatory context provides important information, such as cell-type specificity, much of the information shaping DSB patterns is already encoded in the DNA sequence itself. Our integrative modeling approach not only reveals DSB patterns but also provides a generalizable strategy for tracing predictions in genomic data.
Garcia, J. J.; Yu, K. M.; Freudenreich, C. H.; Cowen, L.
Show abstract
In Bakers yeast, there exists a comprehensive collection of pairwise epistasis experiments that, for nearly every pair of non-essential genes, measures the growth of the double-knockout strain as compared to its component single knockouts. This data can be represented as a weighted signed graph termed the genetic interaction network, and we introduce a new ILP-based method named GIDEON to search for a diverse collection of Between-Pathway Models (BPMs) in this network, where BPMs are a graph motif signature that indicates potential compensatory pathways in the genetic interaction network. With both an improved distribution-informed edge weighting scheme and an improved ILP method, GIDEON produces BPM collections that are substantially larger and with better functional enrichment compared to previous methods. We find some interesting new BPM gene sets including one with potential insights into antifungal drug targets through ties between ergosterol and aromatic amino acid biosynthesis.
Liu, T.; Jiang, S.; Zhang, F.; Sun, K.; Head-Gordon, T.; Zhao, H.
Show abstract
Large language models (LLMs) are in the ascendancy for research in drug discovery, offering unprecedented opportunities to reshape drug research by accelerating hypothesis generation, optimizing candidate prioritization, and enabling more scalable and cost-effective drug discovery pipelines. However there is currently a lack of objective assessments of LLM performance to ascertain their advantages and limitations over traditional drug discovery platforms. To tackle this emergent problem, we have developed DrugPlayGround, a framework to evaluate and benchmark LLM performance for generating meaningful text-based descriptions of physiochemical drug characteristics, drug synergism, drug-protein interactions, and the physiological response to perturbations introduced by drug molecules. Moreover, DrugPlayGround is designed to work with domain experts to provide detailed explanations for justifying the predictions of LLMs, thereby testing LLMs for chemical and biological reasoning capabilities to push their greater use at the frontier of drug discovery at all of its stages.
Slenders, E.; Perego, E.; Zappone, S.; Vicidomini, G.
Show abstract
Fluorescence fluctuation spectroscopy (FFS) is an ensemble of techniques for quantitative measurement of molecular dynamics and interactions. Recently, the introduction of small-format array detectors has opened up a new range of spatiotemporal information, allowing for more detailed analysis of system kinetics. However, there is currently no open-source software available for analyzing the high-dimensional FFS data sets. We present BrightEyes-FFS, an open-source Python-based environment for FFS analysis with array detectors. The environment includes a Python package for reading raw FFS data, computing auto- and cross-correlations using various algorithms, and fitting the correlations to several models. A graphical user interface (GUI), available as a standalone executable, makes the analysis fast and user-friendly. An automated Jupyter Notebook writing tool enables transition from the GUI to Jupyter Notebook for custom analysis. We believe that BrightEyes-FFS will enable a wider community to study diffusion, flow, and interaction dynamics.